Clinical statistics for non-statisticians: Day two

Steve Simon

Re-introduce yourself

Here’s one more interesting number about myself

  • 51: I started running when I was 51.

Tell us one more interesting number about yourself.

Outline of the three day course

  • Day one: Numerical summaries and data visualization
  • Day two: Hypothesis testing and sampling
  • Day three: Statistical tests to compare treatment to a control and regression models

My goal: help you to become a better consumer of statistics

Day two topics

  • Hypothesis testing
    • What does a p-value tell you
    • Why you might prefer a confidence interval
    • What sample size do you need
    • How does a Bayesian data analysis differ
    • What should you do if you do not have a hypothesis to test

Day two topics (continued)

  • Sampling
    • What do you gain with a random sample
    • When might you prefer a non-random sample
    • When should you use randomization or blinding
    • What are the benefits of matching
    • How can you ensure that your sampling approach is ethical

Bad quiz question

A research paper computes a p-value of 0.45. How would you interpret this p-value?

  1. Strong evidence for the null hypothesis
  2. Strong evidence for the alternative hypothesis
  3. Little or no evidence for the null hypothesis
  4. Little or no evidence for the alternative hypothesis
  5. More than one answer above is correct.
  6. I do not know the answer.

A bad confidence interval

A research paper computes a confidence interval for a relative risk of 0.82 to 3.94. This confidence interval tells that the result is

  1. statistically significant and clinically important.
  2. not statistically significant, but is clinically important.
  3. statistically significant, but not clinically important.
  4. not statistically significant, and not clinically important.
  5. The result is ambiguous.
  6. I do not know the answer.

Bayesian question

A Bayesian data analysis can incorporate subjective opinions through the use of

  1. data shrinkage.
  2. a prior distribution.
  3. a posterior distribution.
  4. p-values.
  5. I do not know the answer.

P-values

  • Most commonly reported statistic
    • Also sharply criticized
    • Requires a research hypothesis
  • Two alternatives
    • Confidence intervals
    • Bayesian analysis
  • What to do when no research hypothesis

What is a population?

  • Population: a group that you wish to generalize your research results to. it is defined in terms of
    • Demography,
    • Geography,
    • Occupation,
    • Time,
    • Care requirements,
    • Diagnosis,
    • Or some combination of the above.

Example of a population

All infants born in the state of Missouri during the 1995 calendar year who have one or more visits to the Emergency room during their first year of life.

What is a sample?

  • Sample: subset of a population.
  • Random sample: every person has the same probability of being in the sample.
  • Biased sample: Some people have a decreased probability of being in the sample.
    • Always ask “who was left out?”

An example of a biased sample

  • A researcher wants to characterize illicit drug use in teenagers. She distributes a questionnaire to students attending a local public high school
  • (in the U.S. high school is grades 9-12, which is mostly students from ages 14 to 18.)
  • Explain how this sample is biased.
  • Who has a decreased or even zero probability of being selected.

Type your ideas in the chat box.

Fixing a biased sample

  • Redfine your population
    • Not all teenagers,
      • but those attending public high schools.

What is a parameter?

  • A parameter is a number computed from a sample.
    • Examples
      • Average health care cost associated with the 29,637 child with one or more ER visits.
      • Proportion of these 29,637 children who died in their first year of life.
    • Usually unknown
    • Designated by Greek letters (\(\mu\), \(\pi\), \(rho\))

What is a statistic?

  • A statistic is a number computed from a sample
    • Examples
      • Average health care cost associated with 100 children sampled from a local hospital.
      • Proportion of these 100 children who died in their first year of life.
    • Usually known
    • Designated by non-Greek letters (\(\bar{X}\), \(\hat{p}\), r).

What is Statistics?

  • Statistics
    • The use of information from a sample (a statistic) to make inferences about a population (a parameter)
      • Often a comparison of two populations

What is the null hypothesis?

  • The null hypothesis (\(H_0\)) is a statement about a parameter.
  • It implies no difference, no change, or no relationship.
    • Examples
      • \(H_1:\ \mu_1 - \mu_2 \ne 0\)
      • \(H_0:\ \pi_1 - \pi_2 \ne 0\)
      • \(H_0:\ \rho \ne 0\)

What is the alternative hypothesis?

  • The alternative hypothesis (\(H_1\) or \(H_a\)) implies a difference, change, or relationship.
    • Examples
      • \(H_1:\ \mu_1 - \mu_2 \ne 0\)
      • \(H_1:\ \pi_1 - \pi_2 \ne 0\)
      • \(H_1:\ \rho \ne 0\)

Hypothesis in English instead of Greek

  • Only statisticians like Greek letters
    • Translate to simple text
    • For two group comparisons
      • Safer, more effective
    • For regression models
      • Trend, association

Example of text hypotheses (1 of 1)

  • “… the objective of this 78-week randomised, placebo-controlled study was to determine whether treatment with nilvadipine sustained-release 8 mg, once a day, was effective and safe in slowing the rate of cognitive decline in patients with mild to moderate Alzheimer disease.”
    • Lawlor B, Segurado R, Kennelly S, et al. Nilvadipine in mild to moderate Alzheimer disease: A randomised controlled trial. PLoS Med. 2018; 15(9): e1002660. DOI: 10.1371/journal.pmed.1002660

Example of text hypotheses (2 of 2)

  • “… we investigated trends in BCC incidence over a span of 20 years and the associations between incident BCC and risk factors in a total population of 140,171 participants from 2 large US-based cohort studies: women in the Nurses’ Health Study (NHS; 1986–2006) and men in the Health Professionals’ Follow-up Study (HPFS; 1988–2006).”
    • Wu S, Han J, Li WQ, Li T, Qureshi AA. Basal-cell carcinoma incidence and associated risk factors in U.S. women and men. Am J Epidemiol. 2013; 178(6): 890–897. DOI: 10.1093/aje/kwt073

Use PICO

  • P = patient population
  • I = intervention
  • C = control
  • O = outcome

One-sided alternatives

  • Examples
    • \(H_1:\ \mu_1 - \mu_2 \gt 0\)
    • \(H_1:\ \pi_1 - \pi_2 \gt 0\)
    • \(H_1:\ \rho \gt 0\)
  • Changes in only one direction expected
  • Changes in opposite direction uninteresting

Passive smoking controversy

  • EPA meta-analysis of passive smoking
    • Criticized for using a one-sided hypothesis
    • Samet JM, Burke TA. Turning science into junk: the tobacco industry and passive smoking. Am J Public Health. 2001;91(11):1742–1744.

What is a decision rule? (1/3)

  • Example
    • \(H_0:\ \mu_1 - \mu_2 = 0\)
    • \(H_1:\ \mu_1 - \mu_2 \ne 0\)
    • t = (\(\bar{X}_1-\bar{X}_2\)) / se
    • Accept \(H_0\) if t is close to zero.

What is a decision rule? (2/3)

  • Example
    • \(H_0:\ \pi_1 - \pi_2 = 0\)
    • \(H_1:\ \pi_1 - \pi_2 \ne 0\)
    • t = (\(\hat{p}_1-\hat{p}_2\)) / se
    • Accept \(H_0\) if t is close to zero.

What is a decision rule? (3/3)

  • Example
    • \(H_0:\ \rho = 0\)
    • \(H_1:\ \rho \ne 0\)
    • t = r / se
    • Accept \(H_0\) if t is close to zero.

What is a Type I error?

  • A Type I error is rejecting the null hypothesis when the null hypothesis is true
    • False positive
    • Example involving drug approval: a Type I error is allowing an ineffective drug onto the market.
  • \(\alpha\) = P[Type I error]

What is a Type II error?

  • A Type II error is accepting the null hypothesis when the null hypothesis is false.
    • False negative result
    • An example involving drug approval: a Type II error is keeping an effective drug off of the market.
  • \(\beta\) = P[Type II error]
  • Power = \(1-\beta\)

What is a p-value?

  • Let t =
    • (\(\bar{X}_1-\bar{X}_2\)) / se, or
    • (\(\hat{p}_1-\hat{p}_2\)) / se, or
    • r / se
  • p-value = Prob of sample result, t, or a result more extreme,
    • assuming the null hypothesis is true
  • Small p-value, reject \(H_0\)
  • Large p-value, accept \(H_0\)

Alternate interpretations

  • Consistency between the data and the null
    • Small value, inconsistent
    • Large value, consistent
  • Evidence against the null
    • Small, lots of evidence against the null
    • Large, little evidence against the null

What the p-value is not (1/2)

  • A p-value is NOT the probability that the null hypothesis is true.
    • P[t or more extreme | null] is different than
    • P[null | t or more extreme]
      • P[null] is nonsensical
      • \(\mu\), \(\pi\), or \(\rho\) are unknown constants

What the p-value is not (2/2)

  • Not a measure FOR either hypothesis
    • Little evidence against the null \(\ne\) lots of evidence for the null
  • Not very informative if it is large
    • Need a power calculation, or
    • Narrow confidence interval
  • Not very helpful for huge data sets

Pop quiz, revisited

A research paper computes a p-value of 0.45. How would you interpret this p-value?

  1. Strong evidence for the null
  2. Strong evidence for the alternative
  3. Little or no evidence for the null
  4. Little or no evidence for the alternative
  5. More than one answer above is correct.
  6. I do not know the answer.

Figure 1: xkcd cartoon about jelly beans and cancer

What is p-hacking?

  • Abuse of the hypothesis testing framework.
    • Run multiple tests on the same outcome
    • Test multiple outcome measures
    • Remove outliers and retest
  • Defenses against p-hacking
    • Bonferroni
    • Primary versus secondary
    • Published protocol

What is a confidence interval?

  • Range of plausible values
    • Tries to quantify uncertainty associated with the sampling process.

Example of a confidence interval

  • Homeopathic treatment of swelling after oral surgery
    • 95% CI: -5.5 to 7.5 mm
    • Lokken P, Straumsheim PA, Tveiten D, Skjelbred P, Borchgrevink CF. Effect of homoeopathy on pain and other events after acute trauma: placebo controlled trial with bilateral oral surgery BMJ. 1995;310(6992):1439-1442.

Confidence interval interpretation (1 of 7)

Figure 2: Interval that contains the null value

Confidence interval interpretation (2 of 7)

Figure 3: Interval entirely above the null value

Confidence interval interpretation (3 of 7)

Figure 4: Interval entirely below the null value

Confidence interval interpretation (4 of 7)

Figure 5: Interval entirely inside the range of clinical indifference

Confidence interval interpretation (5 of 7)

Figure 6: Interval partly inside/outside range of clinical indifference

Quiz question, revisited

A research paper computes a confidence interval for a relative risk of 0.82 to 3.94. This confidence interval tells that the result is

  1. statistically significant and clinically important.
  2. not statistically significant, but is clinically important.
  3. statistically significant, but not clinically important.
  4. not statistically significant, and not clinically important.
  5. The result is ambiguous.
  6. I do not know the answer.

Confidence interval interpretation (6 of 7)

Figure 7: Confidence interval that contains the null value

Confidence interval interpretation (7 of 7)

Figure 8: Confidence interval entirely outside the range of clinical indifference

Why you might prefer a confidence interval

  • Provides same information as p-value,
    • Clinical importance

What sample size do you need?

  • Sackett’s formula
  • Rules of thumb
  • Confidence interval width
  • Power calculations
  • Post hoc power - never!
  • Effect sizes - never!

Figure 9: Sackett 2001, PMID:

Figure 10: Formula found in Sackett 2001

Rules of thumb

  • Rule of 50
    • Only for binary outcomes
    • Total sample size is irrelevant
    • Strive for 25/50 events in each group
  • Rule of 16
    • \(ES = (\mu_1-\mu_2)/\sigma\)
    • \(n = 16 / ES^2\)

Confidence interval width

  • How narrow do you want you confidence interval?
    • Algebraic solution
      • \(\pm t_{0.975}SE=\pm 15\)
      • \(SE=S_p \sqrt{1/n1+1/n2}\)
      • Solve for n1 and n2
      • Usually assume n1 = n2
    • Trial and error

Power calculations

  • Need to specify \(D=\mu_1-\mu_2\)
  • Power = \(P[Reject H_0\ |\ D]=0.9\)
  • = \(P[\bar{X}_1-\bar{X}_2 > t_{0.975} SE\ |\ D]=0.9\)

Formal software for sample size calculations

Figure 11: Lenth Power and Sample Size software

Post hoc power, effect sizes - never!

  • Power must be calculated prior to data collection
  • Effect sizes do not reflect clinical judgement

Criticisms of hypothesis testing (1 of 4)

  • Criticisms of the binary hypothesis
    • Dichotomy is simplistic
    • Point null is never true
    • Cannot prove the null
  • Possible remedy
    • \(H_0 \ -\Delta \le \ \mu_1-\mu_2 \le \Delta\)

Criticisms of hypothesis testing (2 of 4)

  • Criticisms of the p-value
    • Not intuitive, easily misunderstood
    • “results more extreme”
    • Ignores clinical importance
    • Does not measure uncontrolled biases

Criticisms of hypothesis testing (3 of 4)

  • General criticisms
    • Too hard to reject H0
    • Too easy to reject H0
    • Too reliant on a single study
    • Thoughtless application

Criticisms of hypothesis testing (4 of 4)

Figure 12: Cartoon showing interpretation of various p-values

What should you do if you do not have a hypothesis to test?

  • Descriptive statistics
    • Include confidence intervals
  • Qualitative data analysis

Bayesian example

Albert 1995

Bayes rule

  • \(P(H|E) = P(E|H) P(H) / P(E)\)
    • H = hypothesis
    • E = evidence (data)

Prior

  • P[H] is prior
    • Subjective prior
      • Contrast optimistic/pessimistic perspectives
      • Incorporate prior knowledge
    • Flat (non-informative prior)

Figure 13: Empty table for prior probabilities

Figure 14: Empty table for prior probabilities

Figure 15: Table with diagonal priors

Figure 16: Table with lower triangle of prior probabilities

Figure 17: Table with upper triangle of prior probabilities

Figure 18: Complete table of prior probabilities

Figure 19: Table of likelihoods

Figure 20: Product of prior and likelihood

Figure 21: Table of posterior probabilities

Figure 22: Posterior probability of equality

Table of superiority posterior probabilities

Repeat the bad quiz question

A research paper computes a p-value of 0.45. How would you interpret this p-value?

  1. Strong evidence for the null hypothesis
  2. Strong evidence for the alternative hypothesis
  3. Little or no evidence for the null hypothesis
  4. Little or no evidence for the alternative hypothesis
  5. More than one answer above is correct.
  6. I do not know the answer.

Repat the bad confidence interval question.

A research paper computes a confidence interval for a relative risk of 0.82 to 3.94. What does this confidence interval tell you.

  1. The result is statistically significant and clinically important.
  2. The result is not statistically significant, but is clinically important.
  3. The result is statistically significant, but not clinically important.
  4. The result is not statistically significant, and not clinically important.
  5. The result is ambiguous.
  6. I do not know the answer.

Repeat of Bayesian question

A Bayesian data analysis can incorporate subjective opinions through the use of Bayes rule.

  1. data shrinkage.
  2. a prior distribution.
  3. a posterior distribution.
  4. p-values.
  5. I do not know the answer.

Summary

In today’s class, you learned about

  • p-values,
  • confidence intervals,
  • justifying your sample size, and
  • Bayesian data analysis

Are there any questions?